Introduction

Frailty is an aging-related syndrome of cumulative physical and physiological decline, resulting in burden of symptoms like weakness and fatigue, greater medical complexity, and lower tolerance to medical and surgical interventions as compared to the general population [1, 2]. This is especially true in cancer, where frailty increases patients’ risk for treatment-related toxicity, treatment discontinuation due to toxicity, disease progression, hospitalization, and death [3,4,5,6]. As such, there is growing recognition of the potential benefits to using frailty assessments to help guide anti-cancer treatment decision making [7,8,9,10]. Studies suggest that consideration of patients’ frailty level when selecting treatment, adjusting doses, and administering supportive care may help reduce toxicity and improve tolerability [5, 11,12,13,14]. Importantly, incorporating an assessment of frailty into clinical decision making may increase access to anti-cancer treatment among “fit” patients who may not have otherwise qualified for treatment due to their age alone [15,16,17].

Multiple approaches have been proposed to detect frailty and guide treatment decisions in older oncology patients. For instance, numerous studies have shown that use of the Comprehensive Geriatric Assessment (CGA), as opposed to standard care, can reduce grade 3–5 chemo-related toxicity, functional decline, and mortality in oncology patients [13, 14, 18]. However, with nine domains, numerous sub-instruments, multiple items per instrument, and the need for specialist training and/or multidisciplinary collaboration, CGAs can be time consuming and resource intensive. This complexity and burden have led some to question the practicality of CGAs for routine use in clinical trial and clinical practice settings [3, 4, 19,20,21,22].

Another method for frailty screening is the Fried Frailty Phenotype. In their landmark study, Fried et al. defined frailty using a cluster of five variables (unintentional weight loss, self-reported exhaustion, low physical activity/energy expenditure, slow gait speed, and weak grip strength) and determined that frailty, as measured by the Fried Frailty Phenotype, was predictive of falls, worsening mobility, hospitalization, and death [23]. A third well-known approach to measuring frailty is the cumulative deficit model or Frailty Index. This model uses 30 to 40 variables that include medical conditions, lab values, symptoms, health attitudes, and functional impairments to calculate a Frailty Index score for each patient, where higher scores indicate greater risk for adverse outcomes [1].

While both the Fried Frailty Phenotype and the Frailty Index are simpler to use than a full CGA and have been validated in oncology patient populations [24,25,26,27], they each have limitations. The Fried Frailty Phenotype requires an in-person objective assessment of grip strength and gait speed by a clinician while the Frailty Index can be difficult to operationalize in clinical trials and clinical practice, given the exhaustive number of variables which need to be consolidated from different data sources (e.g., comorbid conditions and medical history from Electronic Medical Records, symptom report from patients, assessments by clinicians, etc.) [3]. Alternatively, frailty measures that rely solely on patient self-report shift data capturing to the patient and foster patient-centricity [28,29,30,31]. A resourceful approach to measuring frailty through patient self-report would be to leverage data from routinely administered patient-reported outcome (PRO) assessments. For instance, the Fried Frailty Phenotype includes criteria like exhaustion and low physical activity which are similar to concepts captured in PRO instruments commonly used in cancer trials such as the European Organization for Research and Treatment of Cancer (EORTC), Patient-Reported Outcomes Measurement Information System (PROMIS), and Functional Assessment of Chronic Illness Therapy (FACIT) item libraries. This presents an opportunity to explore adaptation of pre-existing PRO data to measure frailty in clinical trials and clinical practice.

The European Organization for Research and Treatment of Cancer Quality of Life 30-item Questionnaire (EORTC QLQ-C30) is commonly deployed in global cancer trials submitted for regulatory decision making [32, 33]. The objective of our retrospective study was to determine the feasibility of measuring frailty using patient responses to relevant EORTC QLQ-C30 items as proxy criteria for the Fried Frailty Phenotype, in a cohort of patients with Multiple Myeloma (MM). MM is a hematologic malignancy of plasma cells, resulting in hypercalcemia, bone disease, anemia, and renal dysfunction [34]. MM serves as an ideal case example because it is a cancer of older adults who represent a highly heterogenous population with varied functional capacities, health status, and ability to tolerate intensive treatment regimens [16]. A number of recent studies have investigated MM-specific frailty measures, including an ongoing trial to determine the impact of frailty-adjusted therapy on clinical outcomes [15, 16, 35,36,37]. Given the rising interest around frailty in MM and the need for a simple yet resourceful approach to measuring frailty, we conducted this proof-of-concept study to identify relevant EORTC QLQ-C30 items to serve as proxy criteria for the Fried Frailty Phenotype and thus determine patients’ level of frailty.

Methods

Study Design and Participants

U.S. Food and Drug Administration (FDA) internal databases were retrospectively searched to identify Phase III randomized clinical trials that were submitted for regulatory review and approved between 2010 and 2021 for the treatment of MM. Baseline data were pooled for this analysis from nine clinical trials that met inclusion criteria. Trials were included if they incorporated the EORTC QLQ-C30 questionnaire. We also limited this analysis to trials in the relapsed/refractory treatment setting as Relapsed/Refractory Multiple Myeloma (RRMM) patients tend to be older and more medically complex than newly diagnosed patients. Patients were included in this analysis if they had completed the EORTC QLQ-C30 questionnaire in its entirety at baseline (prior to randomization and treatment initiation).

Scale Construction

Co-authors selected candidate EORTC QLQ-C30 items that measure concepts identical or similar to each Fried frailty criterion, in order to derive a five-item patient-reported frailty phenotype (PRFP). For example, EORTC QLQ-C30 item #12, “Have you felt weak?” was identified as a proxy for the “weakness” Fried frailty criterion, which is typically captured by measuring grip strength in the clinic. If a given Fried criterion had more than one potential matching candidate EORTC QLQ-C30 item, polychoric correlation coefficients were computed in order to select the item with the highest correlations to the other frailty criterion’s corresponding EORTC QLQ-C30 items (Table 1). Patient responses to each of the final EORTC QLQ-C30 items selected were dichotomized to determine the presence/absence of their corresponding Fried frailty criteria. Two dichotomization approaches were considered (“Not at all” = absent and “A little”/“Quite a bit”/“Very much” = present vs. “Not at all” “A little” = absent and “Quite a bit”/“Very much” = present). Sensitivity analyses were conducted to select the final dichotomization approach. The latter approach was selected due to its congruence with past literature on the prevalence of frailty in MM and its clinical plausibility [28, 38, 39]. Patients were classified as fit if they met none of the five frailty criteria, pre-frail if they met one or two of the frailty criteria, and frail if they met three or more frailty criteria.

Table 1 Patient-Reported Frailty Phenotype (PRFP)—Fried frailty criteria, corresponding candidate items considered, and final items selected to constitute the PRFP

Statistical Analysis

Descriptive statistics were used to summarize clinical and demographic characteristics of patients included in this analysis as well as to determine the number and proportion of patients categorized as frail, pre-frail and fit based on the PRFP model.

The PRFP was constructed based on a reflective indicator model with an assumption of unidimensionality since each of its items are a reflection or an effect of the single underlying concept of interest—frailty. Structural validity of this single-factor model was tested using Confirmatory Factor Analysis (CFA). The model was estimated using a polychoric correlation matrix and the weighted least square means and variances (WLSMV) fitting function. Model fit was evaluated using the following fit indices: (a) chi-square, (b) root mean square error of approximation (RMSEA), (c) comparative fit index (CFI), (d) standardized root mean square residual (SRMR), and (e) Tucker Lewis Index (TLI). A non-significant chi-square statistic indicated acceptable model fit as well as RMSEA ≤ 0.10, CFI > 0.95, SRMR ≤ 0.08, and TLI > 0.95 [40, 41]. In addition, R2 values (squared multiple correlations) were computed as part of the CFA, in order to determine the proportion of variance in each item explained by the latent construct of frailty. R2 > 0.5 indicated a strong association between each item and frailty.

Internal consistency reliability—i.e., the degree to which selected items were inter-related and measured different aspects of the same underlying construct (frailty)—was assessed using Cronbach’s α, with reliability categorized as acceptable (0.70–0.79), good (0.80–0.89), or very reliable (≥ 0.90) [42, 43]. Ninety five percent confidence intervals were calculated for Cronbach’s α.

Known groups validity was assessed to determine whether the PRFP could be used to discriminate between distinct groups which were known to differ on variables of interest like age, Charlson Comorbidity Index (CCI), mobility, self-care, and engagement in usual activity as measured by the EuroQol-5 Dimension (EQ-5D). [44]. Five hypotheses were identified a priori to determine known groups validity:

  1. (1)

    Frail patients were likely to be older than Pre-Frail and Fit patients.

  2. (2)

    Frail patients were more likely to have a Charlson Comorbidity Index (CCI) score ≥ 2 as compared to Pre-Frail and Fit patients.

  3. (3)

    Frail patients were more likely to report problems with mobility on the EQ-5D mobility subscale as compared to Pre-Frail and Fit patients.

  4. (4)

    Frail patients were more likely to report problems with self-care on the EQ-5D self-care subscale as compared to Pre-Frail and Fit patients.

  5. (5)

    Frail patients were more likely to report problems with their usual activities on the EQ-5D usual activities subscale as compared to Pre-Frail and Fit patients.

Known groups analyses for the variables of age and CCI included the entire study sample. However, only six of nine trials incorporated the EQ-5D 3-Level or 5-Level. Thus, known groups analyses for mobility, self-care, and engagement in usual activities as measured by the EQ-5D were limited to the subgroup of patients from the six trials which included the EQ-5D in their study protocol. Group differences were determined using the chi-square test of independence. All analyses were done using R Studio (version 3.6.1) statistical software and two-sided significance levels were set at 0.05.

Results

Out of 5,272 patients randomized across nine trials, 4,928 patients (93%) completed the EORTC QLQ-C30 at baseline and met inclusion/ exclusion criteria. The median (min–max) age of patients in this study was 65 (30–91) years (Table 2). The majority of patients were white (82.4%), male (55.4%), and located outside the United States (93%). Over three quarters of patients in this analysis had a baseline Eastern Cooperative Oncology Group Performance Status (ECOG PS) score of 0 or 1, and their median (interquartile range) EORTC QLQ-C30 Physical Functioning Subscale summary score was 73 (53–87) out of 100. Approximately half of these patients had RRMM at International Staging System (ISS) disease stage I and had two or three prior lines of treatment at baseline. The majority (60%) of patients had no comorbid conditions at study entry, as indicated by their CCI score of 0.

Table 2 Distribution of sociodemographic and baseline clinical characteristics of patients (n = 4928)

We were able to identify potential matching candidate EORTC QLQ-C30 items for each of the Fried criteria. Some criterion had multiple candidate items. For instance, exhaustion corresponded with both item #10 “Did you need to rest?” item #18 “Were you tired?” (Table 1). When multiple candidate items existed for a given Fried criterion, polychoric correlations were used to select a final EORTC QLQ-C30 item. Polychoric correlations for the final selected items ranged from 0.48 (between Items #3 and #13, i.e., trouble taking a short walk and lacking appetite) to 0.74 (between items #6 and #10, i.e., limitations in doing work/daily activities and needing rest) (Appendix A). In the case of the Fried criterion “low physical activity,” one of the alternate candidate items considered (Item #2—Do you have any trouble taking a long walk?) had higher correlations to the other proxy items. However, the second most highly correlated item (Item #6—Were you limited in doing either your work or other daily activities?) was ultimately selected as the final proxy in order to be relevant to respondents with disabilities as well as capture a varied aspect of physical functioning, since Item #3 (ability to take a short walk) already captured the ability to walk. Applying the final items which constituted the PRFP to our study cohort, we found 2,729 fit (55.4%), 1,209 pre-frail (24.5%), and 990 frail (20.1%) RRMM patients.

Model fit indices obtained from the CFA demonstrated reasonable model fit for the proposed frailty model: the scaled (robust) chi-square for PRFP was χ2(df) = 171.83(5) (p ≤ 0.05) and RMSEA was 0.082 [95% CI: 0.072–0.093]. Similarly, the CFI was 0.99, SRMR was 0.03, and TLI was 0.99. Factor loadings for each of the PRFP items exceeded the criterion of at least 0.50 or higher [45], ranging from 0.66 to 0.88. Meanwhile, R2 values ranged from 0.43 to 0.77 (Fig. 1). PRFP also showed good internal consistency reliability [Cronbach’s α 0.85 (95% CI: 0.84–0.85)].

Fig. 1
figure 1

Confirmatory Factor Analysis Results for the Patient-Reported Frailty Phenotype (PRFP). The values in-between the arrows from Frailty to each item are the standardized regression coefficients or ‘factor loadings’, an indicator of the strength of the relationship between an item and the underlying construct (frailty). R2 values are displayed to the left of each item box, representing the variance of each item explained by frailty. R2 values > 0.5 are generally preferred [46]. Coefficients adjacent to the error circles on the right represent variance in each item explained by factors other than frailty. It is generally desirable to have more of the variance in each item explained by the construct of interest (in this case frailty) rather than by error/ factors other than frailty

Through known groups analysis, we found that PRFP could be used to distinguish between distinct patient populations in terms of their comorbidities and mobility, self-care, and engagement in usual activities, as measured by the EQ-5D. For instance, frail patients were far more likely to also report problems with mobility (88% of frail vs. 69% of pre-frail vs. 31% of fit patients, p < 0.05), self-care (54% of frail vs. 28% of pre-frail vs. 7% of fit patients, p < 0.05), and engaging in their usual activities (93% of frail vs. 76% of pre-frail vs. 33% of fit patients, p < 0.05), as compared to their Pre-Frail and Fit counterparts on the EQ-5D at baseline (Table 3). On the other hand, known groups analysis for the age variable only showed a weak association between frailty and age (3.4% of frail patients were > 80 years old while 3.5% of pre-frail patients were > 80 years old, and 3% of fit patients were > 80 years old, p < 0.05) (Table 3).

Table 3 Known Groups Analysis Results for the Patient-Reported Frailty Phenotype (PRFP)

Discussion

This feasibility study demonstrated that it is possible to leverage patient responses to routinely administered PRO assessments to measure frailty in MM patients. In our study of almost 5,000 RRMM patients from registrational clinical trials, we were able to adapt EORTC QLQ-C30 data to derive a patient-reported frailty measure based on the Fried Frailty Phenotype. We then used PRFP to determine the prevalence of frailty in our study sample. Our findings were in line with a previous prospective cohort study of RRMM patients, albeit one in which the International Myeloma Working Group (IMWG) frailty index was used [28]. Similar to our analysis, 53% of patients in their study were fit, 23% were pre-frail, and 24% were frail [28].

In evaluating the measurement properties of PRFP, we found that the proposed patient-reported frailty model was plausible; EORTC QLQ-C30 items that were selected as proxy Fried criteria were well correlated with one another and the measure as a whole demonstrated adequate internal consistency reliability and structural validity. The majority of CFA model fit indices evaluated met their recommended thresholds, with the exception of the chi-square test which was statistically significant. However, chi-square is sensitive to sample size and, given a large sample size, even small departures tend to be significant [46]. Thus, despite the statistically significant chi-square test statistic, our large sample CFA of PRFP lent support to its validity for measuring frailty.

Likewise, results from our known groups analysis supported the ability of PRFP to measure frailty. As hypothesized, PRFP could be used to detect distinct comorbidity levels and distinguish between different functional profiles, with frail patients reporting more difficulty in walking about, washing/dressing, and doing usual activities, than pre-frail and fit patients. However, we only found a weak association between frailty (as measured by PRFP) and age. In general, frailty is thought to correlate with age because of the natural accumulation of aging-associated diseases and disabilities [19]. Yet, we observed that the prevalence of frailty in our sample did not increase significantly with older age. This is perhaps because our study population consisted of clinical trial patients who met stringent inclusion/exclusion criteria for trial enrollment. Thus, inclusion of older patients, especially those above the age of 80, was fairly uncommon in these trials. Moreover, older adults who were included were likely fitter, healthier, and more active than average and therefore not representative of the general RRMM older adult patient population. Another explanation for why we did not observe a strong association between age and frailty in our study is that unlike other commonly used frailty measures in MM, our frailty model excluded age as a component [37]. However, we believe this may be a key strength of the PRFP model because not all patients over 80 are frail and not all frail patients are over 80. By excluding age as a component, the PRFP offers an alternative perspective on frailty that is based on functional deficits, rather than chronological age.

To our knowledge, this is one of the first studies to investigate a fully patient-reported frailty model in the context of MM. Although a number of frailty scales have been developed and tested in MM patients in recent years, they all rely, at least in part, on data collated from clinicians, observers, electronic health records, detailed clinical measurements, and/or laboratory tests [15, 37, 47,48,49,50,51]. Patients, given their lived experience with their condition, are well-equipped to self-report frailty symptoms. However, they remain an underutilized source of data when measuring frailty in oncology, with few studies exploring use of patient self-report to assess frailty in cancer [52, 53]. Meanwhile, multiple studies in the general non-cancer older adult population have demonstrated the feasibility, reliability, and validity of patient-reported frailty measures [29, 30, 54,55,56]. A key benefit to using patient self-report to measure frailty is that it incorporates the patient’s voice and is a more patient-centric approach [28]. In their recent study, Efficace et al. found a strong correlation between RRMM patients’ frailty status, as measured by the IMWG frailty index, and their self-reported physical functioning and symptom burden as captured by the EORTC QLQ-C30 [28]. The authors noted that fit, pre-frail, and frail patient groups had distinct patient-reported Health-Related Quality of Life (HRQoL) profiles and concluded that their work paves the way for a patient-centered frailty index [28]. Our feasibility study further builds on the groundwork laid by Efficace et al.

Adoption of CGAs and other traditional frailty assessments as the standard of clinical care has been slow due to barriers around time and resources [3, 4, 53]. For instance, in addition to clinical measurements like grip strength and gait speed, the original Frief Frailty Phenotype involves calculating kilocalories of energy expended using the Minnesota Leisure Time Activity instrument [23]. This calculation can be cumbersome and time consuming for routine use [56]. On the other hand, the EORTC QLQ-C30 is already deployed in more than 5000 studies worldwide each year and has been translated into and validated in over 100 languages [57]. This widespread use may help to further study patient-reported approaches to characterize frailty with minimal additional time, effort, and resources needed on the part of clinicians and/or clinical trial sites. In addition to its patient-centricity, this stream of patient-generated data may reduce clinical testing and assessment burden, making patient-reported frailty assessments a potential lower cost, practical solution for clinical practice and clinical trial settings [29,30,31, 56]. However, it must be acknowledged that PRFP and other similar frailty measures cannot replace a full CGA for the purpose of clinical decision making and treatment adjustment. For instance, a two-step approach could be considered in clinical practice where a patient-reported frailty measure would serve as an efficient and cost-effective preliminary screening tool to prospectively identify patients who are vulnerable for adverse outcomes and might benefit from a full CGA [4].

Our study has important implications from a clinical trial conduct perspective. In the absence of mandated dedicated frailty assessments, such as a CGA, frailty scales which leverage pre-existing PRO data may provide an alternative approach to obtaining important information on frailty status in a less burdensome way. While the EORTC QLQ-C30 is just one possible source of patient-reported data, it is advantageous in that it is frequently embedded into trial protocols and baseline PRO completion rates are typically high [58]. Moreover, PRO-based frailty scales are well positioned for incorporation into decentralized clinical trials, which have seen accelerated growth since the COVID-19 pandemic [59]. Advances in electronic capture of PROs facilitates the ability of a tool like PRFP to be seamlessly embedded into decentralized clinical trials because it can be self-administered, regardless of location.

Our study findings should be interpreted within the context of several limitations. Firstly, the example PRFP constructed in this exploratory study was done as proof-of-concept. Construction of a validated PRFP would require further psychometric evaluation prior to widespread adoption. Similarly, candidate EORTC QLQ-C30 items corresponding to each Fried frailty criterion were selected based on discussion among co-authors. However, we recognize that there is a need for future qualitative research with patients and subject matter experts to further verify and externally validate the selected proxy items which constitute PRFP. Another limitation is that the EORTC QLQ-C30 items selected in our analysis as proxy Fried frailty criteria are not assessed sequentially but asked intermittently throughout the EORTC QLQ-C30. There are also differences in recall, with four of the five items using a seven-day recall, and one item from the Physical Functioning subscale with no recall window. The effect of combining select items from across the EORTC QLQ-C30 and combining items with a recall window to those without, into a single composite frailty measure is unknown and warrants further prospective research and validation. Next, concordance between PRFP and other widely accepted frailty measures in MM such as the IMWG frailty index has not yet been evaluated. This is an important area for future research to gain a better understanding of potential classification errors. Finally, exact matches for Fried frailty criteria were not always available in the EORTC QLQ-C30 for use as proxy items. Specifically, we approximated the closest fit for “unintentional weight loss” as the EORTC QLQ-C30 item on appetite loss. This approximation was made in order to derive a fully patient-reported frailty measure that leveraged pre-existing PRO data. Future patient-reported frailty measures may be constructed from a more detailed search through the entire library of items within existing measurement systems such as EORTC and PROMIS. We were unable to explore adaptation of alternate PRO assessments because our study was limited to a data source which only incorporated the EORTC QLQ-C30. It should be noted that the benefit of a PRFP constructed from the EORTC QLQ-C30 is that it may already be deployed in clinical trials and thus additional information may be obtained from an already existing data source.

Conclusion

Interest in frailty-directed treatment optimization has been growing in oncology due to its potential for minimizing toxicity, improving tolerability, and lengthening duration of treatment to potentially improve survival. This is especially true for older adults who constitute a heterogenous patient population and for whom outcomes related to health-related quality of life and preserving functional independence are of particular importance. Given the significance of frailty at the individual and population levels, having a simple yet effective tool that can be easily deployed in diverse settings ranging from clinical practice to decentralized trials could be valuable. Our initial exploration of clinical trial data in MM patients suggests that it is feasible to leverage patients’ responses to selected items of the EORTC QLQ-C30 to measure frailty. Further research is required to assess the association between frailty—as measured by PRFP—and clinical outcomes in MM as well as evaluate PRFP’s performance against other widely used frailty measures such as the IMWG frailty index. Given that the EORTC QLQ-C30 is deployed in clinical trials across cancer contexts, additional opportunities exist to retrospectively analyze this example PRFP for other hematologic malignancies and solid tumors.